Exploiting Bilingual Information to Improve Web Search

نویسندگان

  • Wei Gao
  • John Blitzer
  • Ming Zhou
  • Kam-Fai Wong
چکیده

Web search quality can vary widely across languages, even for the same information need. We propose to exploit this variation in quality by learning a ranking function on bilingual queries: queries that appear in query logs for two languages but represent equivalent search interests. For a given bilingual query, along with corresponding monolingual query log and monolingual ranking, we generate a ranking on pairs of documents, one from each language. Then we learn a linear ranking function which exploits bilingual features on pairs of documents, as well as standard monolingual features. Finally, we show how to reconstruct monolingual ranking from a learned bilingual ranking. Using publicly available Chinese and English query logs, we demonstrate for both languages that our ranking technique exploiting bilingual data leads to significant improvements over a state-of-the-art monolingual ranking algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting the Web as the multilingual corpus for unknown query translation

Users’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. We propose a Web-based term transla...

متن کامل

Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian languages

This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora. We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing six different languages. In order to compare how well different types of bilingual dictionaries covered the most common ...

متن کامل

Learning Patterns from the Web to Translate Named Entities for Cross Language Information Retrieval

Named entity (NE) translation plays an important role in many applications. In this paper, we focus on translating NEs from Korean to Chinese to improve Korean-Chinese cross-language information retrieval (KCIR). The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integr...

متن کامل

Parallel Sentences Mining From The Web

Parallel sentences can benefit many NLP applications (e.g., machine translation, cross language information retrieval.) In this paper, the candidate bilingual webs pages are returned by submit sentence pairs to search engine and then validated by surface patterns. We propose an algorithm to candidate bilingual resource extraction and filter useless bilingual web pages. The pair sentences includ...

متن کامل

Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian

This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora.We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing five different languages. In order to compare how well different types of bilingual dictionaries covered the most common ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009